Surgical appropriateness nudges: Developing behavioral science nudges to integrate appropriateness criteria into the decision making of spine surgeons

Background Substantial variation exists in surgeon decision making. In response, multiple specialty societies have established criteria for the appropriate use of spine surgery. Yet few strategies exist to facilitate routine use of appropriateness criteria by surgeons. Behavioral science nudges are increasingly used to enhance decision making by clinicians. We sought to design “surgical appropriateness nudges” to support routine use of appropriateness criteria for degenerative lumbar scoliosis and spondylolisthesis. Methods The work reflected Stage I of the NIH Stage Model for Behavioral Intervention Development and involved an iterative, multi-method approach, emphasizing qualitative methods. Study sites included two large referral centers for spine surgery. We recruited spine surgeons from both sites for two rounds of focus groups. To produce preliminary nudge prototypes, we examined sources of variation in surgeon decision making (Focus Group 1) and synthesized existing knowledge of appropriateness criteria, behavioral science nudge frameworks, electronic tools, and the surgical workflow. We refined nudge prototypes via feedback from content experts, site leaders, and spine surgeons (Focus Group 2). Concurrently, we collected data on surgical practices and outcomes at study sites. We pilot tested the refined nudge prototypes among spine surgeons, and surveyed them about nudge applicability, acceptability, and feasibility (scale 1–5, 5 = strongly agree). Results Fifteen surgeons participated in focus groups, giving substantive input and feedback on nudge design. Refined nudge prototypes included: individualized surgeon score cards (frameworks: descriptive social norms/peer comparison/feedback), online calculators embedded in the EHR (decision aid/mapping), a multispecialty case conference (injunctive norms/social influence), and a preoperative check (reminders/ salience of information/ accountable justification). Two nudges (score cards, preop checks) incorporated data on surgeon practices and outcomes. Six surgeons pilot tested the refined nudges, and five completed the survey (83%). The overall mean score was 4.0 (standard deviation [SD] 0.5), with scores of 3.9 (SD 0.5) for applicability, 4.1 (SD 0.5) for acceptability, and 4.0 (SD 0.5), for feasibility. Conferences had the highest scores 4.3 (SD 0.6) and calculators the lowest 3.9 (SD 0.4). Conclusions Behavioral science nudges might be a promising strategy for facilitating incorporation of appropriateness criteria into the surgical workflow of spine surgeons. Future stages in intervention development will test whether these surgical appropriateness nudges can be implemented in practice and influence surgical decision making.


Introduction
Substantial variation exists in surgeon decision making regarding which patients may be good candidates for spine surgery and which spine operations may be best for each patient.For example, orthopedic surgeons and neurosurgeons, when provided with case scenarios for degenerative spinal conditions, varied greatly in when to offer surgery and which procedures to recommend [1][2][3][4][5].Clinical practice patterns align with these survey findings: sizeable geographic variations exist in rates of common operations, including for lumbar spine procedures, and are not explained by differences in patient populations [2,[6][7][8].Across hospital referral regions, rates of elective lumbar spinal decompression and fusion have varied by 8.6-and 14-fold, respectively, among Medicare beneficiaries [7] Among people who underwent laminectomy for lumbar spondylolisthesis, rates of concomitant fusion have ranged from 82% to 98% across states.Operative outcomes and costs also exhibit substantial variation [9].Use of complex lumbar spinal fusion procedures, instead of less complex ones, has been linked to higher rates of complications [10].
Created in response to unexplained variation in surgical care, appropriateness criteria are tools that provide recommendations about the balance of benefits and risks of a specific procedure for an individual patient [11,12].Appropriateness is an aspect of the quality of health care.Rigorously developed appropriateness criteria have high validity, reliability, and a long history of use in research.Across many conditions, adherence to such criteria has been associated with improved operative outcomes [11,[13][14][15][16][17].Spine surgery specialty societies have made appropriateness criteria and supporting tools publicly available for diverse procedures [18,19].Yet, as with guidelines, many barriers hinder surgeons from using appropriateness criteria in routine practice [20].
To overcome such barriers, behavioral science "nudges" have emerged as a promising strategy for enhancing clinical decision making [21].Behavioral science strives to understand how people make decisions and act on them, and how to overcome common shortfalls in decision making.A nudge is defined as a modest adjustment to the environment that influences behavior in a predictable way without limiting autonomy [22].Literature has classified diverse nudges according to "frameworks" that reflect how the nudges shape behavior as well as the anticipated strength of different types of nudges.Examples of nudge frameworks include drawing attention to social norms, providing default settings, giving reminders, and providing feedback [21,[23][24][25][26][27][28][29][30].Widely used in other settings, evidence has grown that nudges can be effective tools for shaping clinician behavior including promoting adherence to guidelines and other recommended standards of care [21][22][23][24][25][26][27][28][29][30][31][32][33][34][35].
This project sought to conduct exploratory work to generate and begin to refine "surgical appropriateness nudges."These represent a novel intervention that would leverage behavioral science nudges to support spine surgeons' routine use of appropriateness criteria for degenerative lumbar scoliosis and spondylolisthesis and thereby shape the surgeons' decision-making and behavior [36,37].If nudges are eventually found to be effective at increasing the appropriateness of surgical procedures, operative outcomes may improve.

Materials and methods
The National Institutes of Health (NIH) Stage Model for Behavioral Intervention Development proposes best practices for generating, testing, and implementing interventions that are effective at shaping human behavior in real-world settings.Stage I includes generating and refining an intervention, while later stages involve initial experimental tests that maximize internal validity, larger experimental tests in community settings that maximize external validity, and finally research on strategies for promoting adoption of the now evidence-based intervention [38,39].The current work corresponded to Stage I. Characteristically, the earliest phase of behavioral intervention development involves an iterative, multi-method approach that leverages literature reviews, taxonomies of intervention elements, input from content experts, qualitative methods such as focus groups drawn from populations that would receive the intervention, and data on current behaviors and outcomes, among other potential inputs and resources.Hallmarks of this stage include allowing the intervention to remain fluid, to permit ongoing refinements in response to evolving findings, and performing initial tests with a small number of highly selected participants [38,40].
Consistent with this stage, qualitative methods were the primary methods used in this research.Qualitative methods enable researchers to explore behaviors and interactions surrounding complex topics in depth [41,42], particularly when the full range of potential responses is not known a priori [43].Because the proposed application of behavioral science nudges had not been well studied, it was important to remain open to opportunities, challenges, and facilitators to nudge implementation in a surgical setting.Specifically, we conducted focus groups because they are ideal for the initial exploratory phases of intervention design and allow participants' opinions to evolve over the course of discussions with peers [41,42].
Our work adhered to widely accepted standards for rigor in qualitative research [41,42,44], and the Standards for Reporting Qualitative Research (SRQR) guidelines [45].To recruit participants, site leaders invited 89 orthopedists and neurosurgeons who performed procedures for degenerative lumbar scoliosis and/or spondylolisthesis.We sought 5-7 respondents per site to allow sufficient opportunities to engage each surgeon while maximizing diversity in specialty, career stage, gender, and race/ethnicity [46].To facilitate the focus groups, we developed discussion guides with questions and probes to elicit debate.Questions were openended, allowing latitude in exact wording, sequencing of questions, and use of probes while ensuring important domains were consistently addressed [47].
We conducted focus groups on Microsoft Teams and each group lasted approximately 60 minutes.All focus groups were audio-recorded and professionally transcribed to ensure accuracy and fidelity to the original discussion.For this study, because we sought to ensure our understanding of specific processes and insights on specific nudges, we focused on ensuring content saturation within each focus group discussion.We continued each line of questioning until participants had no additional input and pro-actively solicited input from each participant before progressing to the next topic.We utilized commercially available software (Dedoose) to manage coding, retrieval and analysis.In accordance with principles of grounded theory [48], experts in qualitative methods (co-authors PC and NQ) developed a code structure in stages using systematic, inductive procedures to generate insights grounded in participants' views.We coded the first transcript independently and met to discuss differences in coding, making edits to the code structure as needed to reach consensus.We then divided the remaining transcripts, meeting regularly to discuss any coding challenges and identify emergent and recurrent themes.We utilized the constant comparative approach to identify novel concepts [48], consistently classify emergent themes, and refine or expand existing codes as needed.We began with a review of discussion notes highlighting potential early themes and maintained a running list of themes, making edits and consolidating and refining themes when appropriate.
Institutional Review Boards at the RAND Corporation and the two study sites approved this work.One study site required written informed consent while the other site allowed for verbal consent.

Setting
We partnered with two high-volume regional referral centers for spine surgery that have a strong commitment to the quality and outcomes of surgical care and where leaders sought to implement appropriateness criteria in routine practice.At both sites, neurosurgeons and orthopedists care for complex degenerative spine disorders.Site 1 receives referrals from throughout southern California, Nevada, and Arizona.Site 2 is a capitated, integrated healthcare system with 4.5 million members.Both sites use Epic electronic health record systems (EHRs).

Nudge design process
We sought to develop nudges that spine surgeons would view as applicable to their clinical practice, feasible for incorporation in the surgical workflow, and acceptable to the surgeons personally for routine use.
As seen in Fig 1, the first step included conducting Focus Group 1 and synthesizing existing knowledge.Focus Group 1 involved asking spine surgeons to describe the surgical workflow and identify sources of variation in surgeon decision making because these could represent opportunities to insert nudges to enhance decision making.We considered four kinds of existing knowledge: (1) appropriateness criteria for degenerative lumbar scoliosis and spondylolisthesis [37,49], (2) frameworks for behavioral science nudges, (3) electronic tools that can facilitate clinical decision making, and (4) maps of the surgical workflow at study sites.
In the second step of nudge design, we synthesized focus group results and existing knowledge to propose preliminary nudge prototypes.
The third step involved soliciting feedback on preliminary nudge prototypes from the content experts and site leaders as well as from spine surgeons via Focus Group 2. We incorporated the feedback to produce several refined prototypes.
Concurrent to the intervention development work, we gathered data on spine surgeon practices and outcomes at the study sites.
In the fourth step of nudge development, we pilot tested the refined nudge prototypes among several spine surgeons at each site.Some nudge prototypes incorporated the data on surgeons' practices and outcomes.Finally, we surveyed the testers to assess perceptions of the prototypes and obtained additional feedback.We elaborate upon each step below.
To maximize the likelihood that we considered all potentially effective nudge designs and that insights obtained could translate to other settings, we solicited feedback from multiple external experts throughout the nudge design process.This included an Advisory Board of national experts and stakeholders in surgical quality of care, representatives of national specialty societies engaged in relevant work on appropriateness of spine surgery, and collaborators in Switzerland, among others.

Focus Group 1: Surgeon decision making
In focus group 1, we sought a rich understanding of surgeons' beliefs on decision making about appropriateness, including sources of variation as well as implications for practice patterns and patient outcomes.We also presented draft workflow maps and asked participants to make edits and suggestions to better understand key decision making steps in the surgical workflow.

Synthesis of existing knowledge
Appropriateness criteria.In general, these criteria classify the balance of benefits and risks of a specific procedure for 100-1000s of "indications profiles" (scenarios) based on patient characteristics (e.g., history, physical exam, test results), thereby precisely defining the treatment options that are safe to offer an individual patient.The criteria provide recommendations about whether, for a specific indications profile, a particular procedure is "appropriate" (benefits exceed risks), "rarely appropriate" (risks exceed benefits, sometimes called "inappropriate"), or of "uncertain appropriateness" (risks/benefits are uncertain or mixed).
In the present work, we leveraged these two sets of appropriateness criteria.We sought feedback from study spine surgeons about validity and applicability of the criteria (during pilot tests of nudge prototypes, see below).Additionally, we searched for publications on adherence to these appropriateness criteria and other relevant appropriateness criteria.
Behavioral science nudge frameworks.We conducted a purposive literature review to identify key studies on nudges, frameworks (types of nudges), applications in healthcare, and pros/cons of specific nudges.Sources on behavioral science and nudges in general included widely referenced books and articles [21,22,[58][59][60][61][62][63].For applications in healthcare, we searched PubMed using the terms "behavioral economic*", "behavioral science", and "nudg*" as title words (March 18, 2022).Because the results from this search appeared to capture most recent literature applicable to healthcare, we did not conduct an independent systematic review.A content expert in behavioral science (co-author JD) provided input at key stages.
Electronic tools to support clinician decision making.We considered two potential platforms for supporting use of nudges: online calculators and tools based in the EHR.
In prior stages of this work, the American Academy of Orthopaedic Surgeons (AAOS) created online appropriateness calculators for the scoliosis and spondylolisthesis appropriateness criteria (Fig 2) [19,64].
To understand how the appropriateness criteria or online calculators could be delivered to clinicians via the EHR, we conferred with a content expert experienced with Epic systems (coauthor JP).
Surgical workflow maps.We created draft maps of workflows at both sites by interviewing site leaders (co-authors HB, NA), identifying key decisions and their timing, and creating a graphic.We refined the maps based on input from spine surgeons via Focus Group 1 (above).

Preliminary nudge prototypes
After Focus Group 1, we designed preliminary nudge prototypes in several iterative steps.We reviewed lists of nudge frameworks, prior applications of nudges in healthcare, surgical workflows, sources of variation in surgical practice, and prior deployments of nudges via the EHR.We identified opportunities for nudges to influence practice and to leverage existing or novel tools.We considered evidence on the effectiveness of diverse nudges at shaping clinician behavior and how that might apply to surgeons and complex decisions.We used the PreDICT Checklist, a tool designed to help ensure that surgeons' implicit or explicit preferences/concerns are addressed in designing the nudge [63].We considered nudges more likely to be acceptable if transparent and convenient to use.We excluded nudges that were poorly suited to complex decision-making processes.
With input from the surgeon site leaders and content experts, we integrated the assembled information, added any newly suggested nudge frameworks or prototypes, considered the pros/cons of each nudge framework, and up-/down-prioritized them based on major pros/ cons.Because interventions that involve multiple nudges are often effective [25], we selected several frameworks, suggested prototypes of how they could be operationalized in the surgical workflow, and created tables and graphics.The user enters specific clinical characteristics for an individual patient.These include symptoms, age, comorbidities, physical exam signs, imaging test results, and other factors that influence the risks and benefits of a specific procedure for that person.The blue font reflects a profile for a hypothetical patient.Procedure Recommendations: The calculator uses the indications profile to score the appropriateness criteria and yields recommendations for common surgical options.For the hypothetical patient, a green circle means that the procedure is "appropriate" (potential benefits exceed risks), the yellow triangle means that a procedure is of "uncertain appropriateness" (risk-benefit ratio is unclear or mixed), the red X means that the procedure is "rarely appropriate" (risks exceed potential benefits).Note: The American Academy of Orthopaedic Surgeons produced online calculators based on the appropriateness criteria for degenerative lumbar scoliosis and spondylolisthesis.This graphic explains how the appropriateness criteria and associated calculators work.The graphic itself is original artwork by study investigators. https://doi.org/10.1371/journal.pone.0300475.g002

Focus Group 2: Feedback from spine surgeons
Using the same participants and methods as for Focus Group 1, we presented participants with background information on behavioral science and nudge frameworks that had been effective in other healthcare settings.Next, we shared the preliminary nudge prototypes.Surgeons provided substantive, constructive feedback on nudge design, nudges' potential acceptability, feasibility, and effectiveness; their preferences among nudges; and potential refinements.

Refined nudge prototypes
We incorporated the surgeons' feedback and made additional iterative modifications based on assembled information and input from site leaders and content experts.
The three nudges that emerged from Focus Group 2 were not considered strong, according to the framework ranking nudge strength [21], since surgeons would have to actively choose to engage each nudge and to change their behavior.Additionally, none of the nudges would be delivered to surgeons in real time during decision making for individual patients.Given these limitations, we conferred with site leaders and content experts to add a stronger and more active type of nudge that we had not offered to focus group participants.

Surgical practices and outcomes at study sites
In parallel to the nudge design work, we examined surgeon-level variations in practices and outcomes for degenerative lumbar scoliosis and spondylolisthesis at study sites for 2017-2019.In work reported separately [65], we extracted ICD-10-CM and CPT codes from EHR systems, characterized surgeon-level variations in practice (surgeon volume for study conditions, proportion of procedures involving instrumented fusion) and short-term postoperative outcomes (major in-hospital complications, readmissions).
We also collected preliminary data on adherence to appropriateness criteria for 12 spine surgeons divided equally between study sites: half participated in the pilot tests of nudge protypes (below) and half were selected at random.For each surgeon, we randomly selected 5 patients (60 total) from the 2017-2019 dataset.We created a self-correcting data collection instrument in Microsoft Excel and then trained clinicians to manually apply the instrument to medical records in the EHR.

Initial pilot testing of refined prototypes
Among the surgeons who participated in the focus groups, we sought six volunteers (three per site) to pilot test four refined nudge prototypes, tailoring the testing approach for each nudge.
One prototype involved the AAOS online calculators.We asked each surgeon to use the calculators with �5 clinic patients for whom they were considering surgery (�30 patients total).In feedback meetings with the study team, the surgeons shared their experiences, commenting on both the appropriateness recommendations as well as usability of the calculators.
Two prototypes involved providing the participating surgeons with individualized feedback about their practices and outcomes relative to peers at study sites, and their adherence to appropriateness criteria for individual patients.First, we designed the documents to deliver the feedback.Next, we incorporated the 2017-2019 data from study sites (above) into the documents, and then provided the participants with the individualized feedback.
One nudge involved a conference on appropriateness.Because surgeons routinely participate in similar conferences, we provided the participating surgeons with a detailed one-page description of how the appropriateness conference would function.
After the pilot test activities, we surveyed the six participating surgeons about the refined nudge prototypes.We adapted an existing instrument that is widely used to assess implementation outcomes [66].For each of the refined nudge prototypes, our survey included eight items related to nudge applicability, feasibility, and acceptability, scored on a 1-5 scale (5 = strongly agree).We distributed the survey via REDCap.We also solicited qualitative feedback about testers' experiences.

Results
Fifteen surgeons voluntarily participated in the focus groups across the two sites.Surgeons were diverse with regards to age, gender and specialty (e.g., neurosurgery and orthopedic surgery).Six of these surgeons voluntarily participated in the pilot tests.

Focus Group 1: Surgeon decision making
Surgeons believed that institutional protocols and procedures, the availability of specialists, and reimbursement models affected surgical decision making.The main surgeon characteristic highlighted was experience, with more senior surgeons reporting that they became more conservative and relied more on clinical symptoms and signs rather than imaging.Patient factors included patients' priorities, out-of-pocket costs, and age/comorbidities.See S1 File for detailed results of Focus Group 1.

Synthesis of existing knowledge
Appropriateness criteria.Surgeons who participated in the pilot tests did not report concerns about the validity or applicability of the criteria.
We identified two publications that had employed the study appropriateness criteria: the Schulthess Klinik and a Dutch group found that 18-40% of operations were inappropriate, and that appropriateness was associated with better patient-reported outcomes [67,68].In earlier studies, 14-49% of lumbar spine operations were found to be inappropriate [14,54,68].
Electronic tools to support clinician decision making.Surgeons who pilot tested the AAOS' online calculators suggested several modest improvements in formatting.
We identified three Epic-based platforms that could deliver nudges.A "smart phrase" can facilitate consistent documentation, bring in data from elsewhere in the EHR, and include embedded links (nudge framework: defaults).A best practice alert is a type of interruptive clinical decision support (nudge framework: reminders/alerts).The final platform was putting a link to online appropriateness calculators in an easy-to-find EHR menu (nudge framework: decision aid/mapping).

Preliminary nudge prototypes
Our iterative process produced five preliminary prototypes that incorporated several nudge frameworks: (1) individualized score cards (descriptive norm/peer comparison/feedback) that report surgeon's practices and outcomes relative to peers; (2) online appropriateness calculators with links embedded in the EHR (decision aid/mapping), (3) structured note templates (defaults) that include prompt surgeons to document the clinical variables needed to assess appropriateness; (4) prompts to document a rationale for any exceptions (accountable justification) if a planned operation was not aligned with appropriateness criteria; (5) multispecialty case conference (injunctive norm/social influence) to discuss exemplar cases, appropriateness criteria, and recent publications.We considered and excluded additional nudge frameworks because they were poorly suited to highly complex decision making.
Focus Group 2: Feedback from spine surgeons.Surgeons provided in-depth feedback and narrowed the list of five preliminary nudge prototypes to three that appeared most promising: embedding links to the online calculators in the EHR, individualized surgeon score cards, and multispecialty case conferences (S3 File includes detailed results of Focus Group 2).
Individualized score cards.This nudge generated much discussion among participantssurgeons felt that score cards should be clear and transparent in their development, argued that they should not be publicly reported, and were most interested in comparing themselves with close peers.
Online appropriateness calculators.Surgeons generally viewed these favorably.A barrier to use was that surgeons would need to remember to access the calculators, sometimes from diverse computer terminals.
Structured note template.Surgeons had mixed reactions, reporting that documentation practices varied greatly.Several surgeons did not write visit notes themselves, but rather asked spine fellows and physician assistants to do so, and variable documentation practices would hamper implementation.
Rationale for exceptions.Focus group participants did not have strong reactions for or against this, but no promising opportunity to incorporate this in the workflow had been identified.
Multispecialty case conference.Surgeons were receptive, based on their experiences with case conferences from residency/fellowship training.However, they doubted that they often performed "rarely appropriate" procedures or omitted highly "appropriate" ones.But they did often wrestle with the many situations where appropriateness was uncertain and wanted these to be discussed in case conferences too.

Refined nudge prototypes
Based on Focus Group 2, three nudges emerged most promising: online calculators embedded in the EHR, individualized surgeon score cards, and multispecialty case conferences.Because these nudges are of only weak to moderate strength, we later added a fourth, stronger nudge that we had not presented to the focus group participants: a preoperative ("preop") check.Table 1 summarizes the refined nudge prototypes, while Fig 3 shows how they fit into the spine surgery workflow and ultimately shape surgical practice and outcomes.The S4 File includes a table summarizing the nudge frameworks considered in this work, those included vs. excluded, and a rationale for each decision.Additionally, the S5 File includes details about the refined nudge prototypes including examples.
Individualized score cards.In addition to leveraging the descriptive norm, peer comparison, and feedback frameworks, viewing individualized data on performance may encourage surgeons to respond to the other nudges.Although we considered "framing" the information to emphasize the negatives, other literature suggested that the score cards would need to provide feedback in a non-judgmental, constructive way to avoid triggering defensive reactions [70,72].
Online appropriateness calculators.By themselves, decision aids are weak interventions since surgeons have to remember to use them.However, as part of the other three nudges, surgeons would receive reminders about the online calculators, how to find them in the EHR, and why they would be useful.
Multispecialty case conference.Due to the roles of social influence and norms in shaping surgical practice, a case conference might influence the appropriateness of surgery both directly and indirectly (i.e., beyond the patients presented).Preop check.Implementing the preop check requires a readily available, continuously updated data stream that can detect when surgeons are considering surgery for specific patients.The operating room schedule represents such a data stream.Accordingly, the preop check is feasible late in the surgical workflow, after the surgeon has offered surgery to the patient.Nonetheless, the preop check could still guide surgeons to the procedure with the best risk-benefit ratio, prompt surgeons to document a rationale if the surgical plan diverges from the appropriateness criteria (an "accountable justification"), and/or modify their decision making for subsequent patients.

Surgical practices and outcomes at study sites
Across the study sites in 2017-2019, 89 spine surgeons performed 2,481 eligible operations.The surgeons exhibited substantial variation in operative volume, use of instrumented fusion, and postoperative outcomes [65].At the two sites, the median eligible operative volume was 9 (range 1-119) and 14 (range 1-115), respectively.Higher-volume individual surgeons (�15 eligible procedures) used instrumented fusion in 0% to >90% of their operations for scoliosis and 9% to 100% of their operations for spondylolisthesis, and they had major in-hospital complications after 0% to 25% their operations for scoliosis and 0% to 14% of their operations for spondylolisthesis (reported in detail elsewhere).
Of the 60 surgical patients in whom we tested methods for scoring appropriateness criteria, 30 had sufficient information documented in the EHR to score.Surgery was discordant with the appropriateness criteria in 16 (53%): 12 (40%) had inappropriate surgery and 4 (13%) did not undergo the best procedure.These rates are similar to published literature (above).Refined surgical appropriateness nudges in preoperative workflow.Workflow: After being referred a patient with degenerative lumbar scoliosis and/or spondylolisthesis (white oval), the surgeon assesses the risks and benefits of performing surgery and of alternative surgical procedures (grey rectangle).Based on this assessment, the surgeon discusses treatment options with the patient (grey rectangle).Based on this discussion, the surgeon then operates or not (white diamond) and, if operating, selects among multiple procedure options (white diamond).The surgical care provided may or may not align with appropriateness criteria recommendations (white oval).The appropriateness of surgical care influences the patient's health outcomes (white oval).Nudges: Individualized Score Cards and the Multispecialty Case Conferences (black rectangles) can inform the surgeon's overall approach to assessing risks and benefits.Surgeons can access Online Calculators and Multispecialty Case Conference Database during the assessment of risks/benefits and selection of procedure for an individual patient.The Preop Check delivers recommendations to the surgeon shortly after the procedure has been scheduled.https://doi.org/10.1371/journal.pone.0300475.g003

Initial pilot testing of refined prototypes
Of the six surgeons who participated in pilot tests, five completed the survey (83%).Among respondents, the mean overall score was 4.0 (standard deviation [SD] 0.5), showing good overall support for the nudges (Fig 4).Mean scores by dimension were: applicability 3.9 (SD 0.5), feasibility 4.0 (0.5), and acceptability 4.1 (0.5).Conferences had the highest scores 4.3 (SD 0.6) and calculators had the lowest 3.9 (0.4).See S6 File for items and responses.

Discussion
In this study, we designed surgical appropriateness nudges using an iterative primarily qualitative process that leveraged published literature, content experts, spine surgery leaders, and focus groups with spine surgeons at two regional referral centers, as well as data on baseline surgical practices and outcomes.Surgeons valued the ability to assess the risks and benefits of surgery for individual patients and were amenable to surgical appropriateness nudges that might help them achieve that.In particular, surgeons thought that three nudges appeared promising: individualized surgeon score cards, online appropriateness calculators, and a multispecialty case conference and database.To strengthen the set of nudges, the research team added a preop check that would give surgeons feedback about appropriateness in real time before surgery.Surgeons who pilot tested these four nudges rated them on average as applicable, acceptable, and feasible.
Despite the increasing use of nudges in clinical contexts, very few studies have addressed surgical care.Studies have addressed hand hygiene in surgical ICUs, lung-protective ventilation strategies during general anesthesia, use of chlorhexidine in ventilated patients in surgical ICUs, prescribing opioids after surgery, and prescribing perioperative antibiotics [21,[23][24][25][26][27][28][29][30].Our work appears innovative in its application of nudges to surgical appropriateness.By engaging surgeons in nudge development, we aimed to improve the reception of nudges among this group.
Sethi et al. reported a prior systematic effort to assure the appropriateness of surgery for adult scoliosis patients [77].As one part of a multifaceted quality improvement intervention, the investigators held a case conference where clinicians from neurosurgery, anesthesia, orthopedics, internal medicine, behavioral health, and nursing met to collaboratively decide on the appropriateness of surgery for each patient.Intervention patients had half as many major 30-day complications as historical controls.Our proposed case conference nudge is generally similar, but differs in that surgeons' participation would be optional, proceedings would leverage existing appropriateness criteria, and a searchable database would make conference deliberations available for reference [77,78].
Our work has several limitations.As would be expected of a Stage I effort to develop a behavioral intervention, we only included a small number of surgeons from two sites in the same geographic area.However, the institutional structures and cultures were distinct, and we sought input from a wide range of stakeholders and experts outside of the study sites.Developing nudge prototypes involved subjective judgments and we did not present all nudge frameworks during the focus groups, emphasizing nudge frameworks where implementation strategies were clearer and excluding those poorly suited to complex decision making.We designed the preop check after the focus groups, reducing feedback obtained on this nudge so far.Nonetheless, the surgeons' ratings about this nudge were favorable overall and comparable to those of the other nudges.
This work is the first step in developing and testing surgical appropriateness nudges for degenerative lumbar scoliosis and spondylolisthesis.Consistent with the NIH Stage Model of Behavioral Intervention Development, subsequent steps include preparing the nudges for implementation, actually implementing them at these study sites, making further refinements, and then conducting early tests of effectiveness at these study sites.Later stages would involve testing effectiveness in diverse settings to maximize external validity as well as dissemination and implementation research [38].While the current nudge design process largely relied on qualitative methods, work is needed to demonstrate that these nudged can be implemented in practice, followed by a future randomized controlled trial to test whether the surgical appropriateness nudges can shape surgical decision making.If effective at improving surgeon decision making, nudges may secondarily lower major complication rates and improve patientreported outcomes.

Fig 2 .
Fig 2. Online appropriateness calculator for degenerative lumbar scoliosis.Indications Profile:The user enters specific clinical characteristics for an individual patient.These include symptoms, age, comorbidities, physical exam signs, imaging test results, and other factors that influence the risks and benefits of a specific procedure for that person.The blue font reflects a profile for a hypothetical patient.Procedure Recommendations: The calculator uses the indications profile to score the appropriateness criteria and yields recommendations for common surgical options.For the hypothetical patient, a green circle means that the procedure is "appropriate" (potential benefits exceed risks), the yellow triangle means that a procedure is of "uncertain appropriateness" (risk-benefit ratio is unclear or mixed), the red X means that the procedure is "rarely appropriate" (risks exceed potential benefits).Note: The American Academy of Orthopaedic Surgeons produced online calculators based on the appropriateness criteria for degenerative lumbar scoliosis and spondylolisthesis.This graphic explains how the appropriateness criteria and associated calculators work.The graphic itself is original artwork by study investigators.

Fig 3 .
Fig 3.Refined surgical appropriateness nudges in preoperative workflow.Workflow: After being referred a patient with degenerative lumbar scoliosis and/or spondylolisthesis (white oval), the surgeon assesses the risks and benefits of performing surgery and of alternative surgical procedures (grey rectangle).Based on this assessment, the surgeon discusses treatment options with the patient (grey rectangle).Based on this discussion, the surgeon then operates or not (white diamond) and, if operating, selects among multiple procedure options (white diamond).The surgical care provided may or may not align with appropriateness criteria recommendations (white oval).The appropriateness of surgical care influences the patient's health outcomes (white oval).Nudges: Individualized Score Cards and the Multispecialty Case Conferences (black rectangles) can inform the surgeon's overall approach to assessing risks and benefits.Surgeons can access Online Calculators and Multispecialty Case Conference Database during the assessment of risks/benefits and selection of procedure for an individual patient.The Preop Check delivers recommendations to the surgeon shortly after the procedure has been scheduled.

Table 1 . Refined prototypes of surgical appropriateness nudges: Design and characteristics. Design Characteristics Individualized Score Cards:
[75,76] al. (2021)ach surgeon's adherence to appropriateness criteria as well as practices (use of instrumented fusion) and outcomes (major in-hospital complications) relative to peers, framed to draw the surgeon's attention.Score cards would be updated regularly, delivered privately, and mask the identities of peers.Reviewing score card data may encourage surgeons to respond to other nudges.Based on a ranking of the strength of nudge frameworks in an article byLast et al. (2021)[21].†Descriptivenormsrepresenthow people actually behave in practice-in this case, their performance relative to peers[75,76].‡Injunctivenorms reflect perceptions of what behaviors are approved or disapproved by others-in this case a multidisciplinary team of experts[75,76].